Data-driven emotion conversion in spoken English

نویسندگان

  • Zeynep Inanoglu
  • Steve J. Young
چکیده

This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ( 15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system performs F0 conversion at the syllable level while duration conversion takes place at the phone level using a set of linguistic regression trees. Two alternative methods are presented as a means to predict F0 contours for unseen utterances. Firstly, an HMM-based approach uses syllables as linguistic building blocks to model and generate F0 contours. Secondly, an F0 segment selection approach expresses F0 conversion as a search problem, where syllable-based F0 contour segments from a target speech corpus are spliced together under contextual constraints. To complement the prosody modules, a GMM-based spectral conversion function is used to transform the voice quality. Each independent module and the combined emotion conversion framework were evaluated through a perceptual study. Preference tests demonstrated that each module contributes a measurable improvement in the perception of the target emotion. Furthermore, an emotion classification test showed that converted utterances with either F0 generation technique were able to convey the desired emotion above chance level. However, F0 segment selection outperforms the HMM-based F0 generation method both in terms of emotion recognition rates as well as intonation quality scores, particularly in the case of anger and surprise. Using segment selection, the emotion recognition rates for the converted neutral utterances were comparable to the same utterances spoken directly in the target emotion. 2008 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Uniication-based Grammars Using the Spoken English Corpus

This paper describes a grammar learning system that combines model-based and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is the case when using either learning style in isolation.

متن کامل

Learning Unification-Based Grammars Using the Spoken English Corpus

This paper describes a grammar learning system that combines model-based and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is the case when using either learning style in isolation.

متن کامل

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

Emotion conversion using F0 segment selection

This paper describes F0 segment selection, a novel syllablebased F0 conversion method, which provides a concatenative framework to search for F0 segments in a modest corpus of emotional speech (∼15 minutes of data). The method is compared with our earlier work on F0 generation using contextsensitive syllable HMMs. Both methods are complemented with a duration conversion module as well as GMM-ba...

متن کامل

Phoneme-to-grapheme mapping for spoken inquiries to the semantic web

Automatic methods for grapheme-to-phoneme (G2P) and phoneme-to-grapheme (P2G) conversion have become very popular in recent years. Their performance has improved considerably, while at the same time these developments required less input from expert lexicographers. Continuing in this tradition we will present in this paper a data-driven, language-independent approach called MASSIVE with which i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2009